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Abstract — We propose to model the image differentials of 
astrophysical source maps by Student's t -distribution and to 
use them in the Bayesian source separation method as priors. 
We introduce an efficient Markov Chain Monte Carlo (MCMC) 
sampling scheme to unmix the astrophysical sources and describe 
the derivation details. In this scheme, we use the Langevin 
stochastic equation for transitions, which enables parallel draw- 
ing of random samples from the posterior, and reduces the 
computation time significantly (by two orders of magnitude). 
In addition. Student's f-distribution parameters are updated 
throughout the iterations. The results on astrophysical source 
separation are assessed with two performance criteria defined in 
the pixel and the frequency domains. 

Index Terms — Bayesian source separation. Multi-channel 
denoising, Metropolis-Hastings, Langevin stochastic equation, 
MCMC, Astrophysical images. Student's t-distribution. 



I. Introduction 

THE Bayesian framework, which enables the inclusion of 
prior knowledge in problem formulation, has recentiy 
been utilized to improve the performance of Blind Source Sep- 
aration (BSS) techniques. In the context of image separation, 
one obvious type of prior information is the spatial 1 1 1 or 
spatio-chromatic f2l| dependence among the source pixels. 

While there are three conditions that ensure separability 
of sources, namely, non-Gaussianity, non-whiteness and non- 
stationarity |3|, we choose to exploit spatial correlation (i.e. 
spatial non-whiteness). The prior densities are constituted by 
modeling the image differentials in different directions as 
Multivariate Student's i-distributions |4|. The t-distribution 
has some convenient properties for our model: If the degree 
of freedom parameter of the distribution goes to infinity, it ap- 
proaches a normal density; conversely, if the degree of freedom 
parameter equals 1, the density becomes Cauchy. Therefore 
the i-distribution is a flexible and tractable statistical model 
for data ranging from broad-tailed to normally distributed. 
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The first examples of use of t-distribution in inverse imaging 
problems can be found in fSl and ||6l. In ||6l, it is shown that 
the t-distribution approximates the distribution of the wavelet 
coefficients of an image more accurately. In recent papers, 
it has been used for image restoration Q and deconvolution 
f8l. Notice that the degree of freedom parameter of the t- 
distribution has the same role as the regularization parameter 
of the Mai-kov Random Field (MRF) models. The MRF prior 
with Cauchy density, which was first proposed in fTO'l for 
inverse imaging, was used in 1 1 1 for source separation. The 
model used in |[T1 is an approximation to the <-distribution, 
which is presented in Section IV-AI 

The <-distribution has already been used in Bayesian audio 
source separation ||9l to model the discrete cosine transform 
coefficients of the audio signals. It was reported that the 
<-distribution prior had improved the sound quality over 
the finite mixture-of-Gaussians prior In this study, to solve 
the Bayesian BSS problem for images without incurring in 
smoothing artifacts, we propose the t-distribution for modeling 
the local pixel differences. 

We use the joint posterior density of the complete vari- 
able set to obtain a joint estimate of all the variables. In 
this Bayesian approach, the BSS problem can be solved by 
maximizing the joint posterior density of the sources, the 
mixing matrix and the source prior model parameters lfT2ll . 
lT3l . A method for solving the joint posterior modal estimation 
problem is the Iterated Conditional Mode (ICM) method, 
which maximizes the conditional densities sequentially for 
each variable ifTTl . If the mode of the conditional density 
cannot be found analytically, any deterministic optimization 
method can be used |13|. However, under any non-Gaussian 
hypothesis, ICM does not guarantee a unique global solution. 

Another algorithm suitable for learning the Gaussian MRF 
is the Expectation-Maximization (EM) method. Using the 
Mean Field Approximation (MFA), the expectation step of 
the EM algorithm can be calculated analytically |14|. The 
MFA under Gaussian model assumption causes smoothing the 
edges in the image. The reason underlying these smoothing 
effects is that the Gaussian approximation violates the edge 
preserving property of the prior density, and the effect is 
proportional to the amount of noise. The image model with 
spatially varying variance parameter in variational Bayesian 
approximation can help overcome the smoothing problem |7|, 
f8| . In ITSl . ST4\ . deterministic optimization techniques have 
been used for the MRF. In a Gibbs sampling stochastic 
optimization procedure is used. Since it is not possible to draw 
samples in a simple way due to the MRF priors in Gibbs 
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distribution form, a Metropolis embedded Gibbs sampling has 
been adopted. In [Tl, it is reported that the Monte Carlo results 
with less image smoothing, but the cost of avoiding smoothing 
artifacts is a significant increase in the convergence time. 

In this study, we propose a more efficient Monte Carlo 
Markov Chain (MCMC) sampling method in lieu of random 
walk Metropolis. To produce proposal samples in parallel, 
we resort to the Langevin stochastic equation |fT6l . ifTTl . 
ifTS I. while the proposed samples are accepted or rejected by 
the Metropolis scheme. In statistical physics, the Langevin 
equation fTSl is used to describe the Brownian motion of 
particles in a potential field and has been used to obtain a 
smart MC algorithm in ifTTl . Another parallel sampling algo- 
rithm is the Hamilton Monte Carlo, which is the generalized 
version of the Langevin sampler [18 |. We conjecture that, with 
the samples produced in parallel by the Langevin equation, 
the convergence time of the algorithm will be significantly 
reduced. 

Parameter estimation in Bayesian edge preserving inverse 
imaging problems with Gibbs distributions is a troublesome 
process because of the partition function. Although there 
are some methods ||T9l , ||20| that calculate the parameters 
using Monte Carlo techniques, their computational burdens 
are prohibitive. One can resort to the Pseudo Likelihood 
(PL) approximation to make the partition function separable, 
which is more convenient for parameter estimation with the 
Maximum Likelihood (ML) method. In [21], two Bayesian 
approaches have been used to estimate parameters, namely, 
the Maximum-a-Posteriori (MAP) and evidence approaches. 
An approach to estimate the regularization parameter from 
the PL approximation has been recently proposed ll22l . The 
multivariate ^-distribution is also a PL approximation to MRF 
and has advantages over MRF in parameter calculations. 

There are two types of parameters in edge preserving 
inverse imaging. The first one is the adaptive edge preserving 
parameter, which is also known as threshold parameter We 
interpret the threshold parameter as the scale parameter of 
the t-distribution. The other parameter is the regularization 
parameter, which adjusts the balance between the likelihood 
and the prior. The regularization parameter corresponds to the 
degree of freedom (dof) parameter of the t-distribution. To 
estimate the scale and dof parameters of our t-distribution, we 
use ML estimation via EM algorithm as in |30|. A similar 
approach has been used in |7| and |8|. 

In a comparative study among image source separation algo- 
rithms |1 1, we have found that the Bayesian formulation with 
MRF prior and Gibbs sampling outperformed the heuristic and 
the other Bayesian approaches. This work is based on [l], and 
aims to achieve a much faster MCMC implementation without 
compromising its good performance. With this goal in mind, 
we have been testing our algorithms on a current problem of 
modern astrophysics: the separation of radiation source maps 
from multichannel images of the sky at microwave frequen- 
cies. In particular, we have been applying our algorithm to 
the separation of the Cosmic Microwave Background (CMB) 
radiation from the galactic emission (synchrotron and thermal 
dust emissions) using realistically simulated sky maps. 

The original contributions of the paper hinge on two aspects. 



First, we adopt the i-distribution to build a prior model of the 
source maps. More specifically, the ^-distribution is used to 
model the differentials of the sources, as done by the first-order 
homogeneous MRF models. This is advantageous because the 
flexibility of the t-distribution allows each source differential 
to assume a different model whether impulsive or Gaussian, 
simply by setting the dof parameter. The second contribution 
is the introduction of the Langevin sampling scheme in lieu 
of the random walk Gibbs sampling. As opposed to pixel- 
by-pixel sampling, Langevin sampling generates the samples 
in parallel, thus leading to much faster convergence. Further- 
more, the samples are drawn in an informed way, since their 
generation follows the gradient descent direction on an energy 
surface. 

Section |ll] is a brief introduction to astrophysical sources. 
In Section |llll the component separation problem in observa- 
tional astrophysics is stated. Section ITV] lavs out the Bayesian 
formulation of the problem. Section [V] presents the derivation 
steps of the adaptive Langevin sampler algorithm along with 
the EM parameter estimation method. The simulation results 
are presented in Section [VT] and interpreted in Section IVIII 

II. An Introduction to Astrophysical Sources 

Here, we only give a brief description of the astrophysical 
radiations considered, referring the interested reader to f23\ 
for details. We are interested in the frequency range 30 to 
1000 GHz where the dominant diffuse radiations are the CMB, 
the galactic synchrotron radiation and the thermal emission 
from galactic dust. Studying this radiation would help us to 
understand the distribution and the features of interstellar dust 
in our galaxy. 

The most interesting astrophysical source in the microwave 
region of the electromagnetic spectrum is the CMB, a relic 
radiation originating from the time when the Universe was 
300.000 years old. The discovery of the CMB is one of the 
fundamental milestones of modern cosmology and its study 
allows us to determine fundamental parameters such as the 
age of the universe, its matter and energy composition, its 
geometry and many other relevant cosmological parameters. 
CMB should be a blackbody radiation at a temperature of 
2.726 K, thus its emission spectrum should be perfectly 
known. The CMB emission dominates over the other sources 
at frequencies around 100 GHz. The CMB temperature is not 
perfectly anisotropic. Standard cosmological models predict 
that the CMB anisotropy is Gaussian distributed, although 
some alternative models permit a certain degree of non- 
Gaussianity. Current observations are compatible with the hy- 
pothesis of Gaussianity. Two all-sky surveys have been made 
on CMB so far, by NASA's satelUtes COBE |32| and WMAP 
|33|. A European mission whose data will be highly accurate 
and spatially resolved, Planck [|23l . is about to provide its first 
full-sky coverage maps. 

The CMB signal is mixed with other astrophysical sources 
of electromagnetic radiation. Relativistic electrons being accel- 
erated by magnetic fields in the Galaxy give rise to synchrotron 
emission, which dominates over the CMB in regions close 
to the Galactic plane especially at low frequencies (< 200 
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(a) CMB horizontal 



(b) CMB vertical 
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(c) Synchrotron horizontal 



(d) Synchrotron vertical 



(e) Dust horizontal 



(f) Dust vertical 



Fig. 1. Fitting plots of the image differential histograms (dots) of CMB, 
synchrotron and dust components in horizontal and vertical directions. The 
fitted functions propoilional to Gaussian (dot line), Cauchy (dash-dot hne) 
and t-distribution (dash line). 



GHz). According to observations in other frequency bands, 
the synchrotron emission spectrum follows a power law with 
a negative exponent, whose value is presently known with 
high uncertainty. Synchrotron is the dominant radiation in the 
low-frequency bands of our range of interest. Inter-stellar dust 
grains are heated by nearby stars and re-emit thermal radiation 
in the far infrared region of the electromagnetic spectrum. 
Dust radiation is dominant in the high end of our range. 
In particular, it is almost the only significant contribution to 
the total diffuse radiation between 800 GHz and 1000 GHz. 
Its emission spectrum should follow a greybody law, with 
unknown spectral index and an additional degree of freedom 
given by the thermodynamical temperature of the dust grains. 
For a short review on CMB astronomy, see t24J . 

There are other astrophysical sources present at microwave 
frequencies, such as free-free emission due to free electrons, 
anomalous dust emission and radiation coming from extra- 
galactic sources, but their relevance is smaller. In this work 
we will focus on the main three astrophysical sources present 
in CMB experiments: CMB, synchrotron and dust. 

In order to justify our adoption of the Student's t- 
distribution in a relevant case, we have selected a 15° x 15° 



TABLE I 

Root Mean Square Error (RMSE) of fitting of the image 
differential histograms of cmb, synchrotron and dust 
components. the last column shows the estimated dof 
parameters of the t-distribution. 

Horizontal direction 





Gaussian 


Cauchy 


(-distribution 


dof 


CMB 


15.11 


30.84 


15.47 


25.71 


Synchrotron 


23.70 


30.06 


15.70 


3.59 


Dust 


67.99 


33.21 


13.84 


1.81 


Vertical direction 




Gaussian 


Cauchy 


(-distribution 


dof 


CMB 


15.75 


29.70 


15.83 


15.75 


Synchrotron 


19.11 


31.00 


18.39 


19.11 


Dust 


63.73 


68.19 


54.42 


2.18 



sky patch, located at 0° galactic longitude and 40° galactic 
latitude, discretized into a 512 x 512-pixel map. Within this 
patch, we have introduced simulated CMB, synchrotron, and 
dust radiation maps (as in |26|). We have computed the source 
image differentials for horizontal and vertical directions, and 
estimated their empirical distributions. We have fitted three 
different functions to the empirical distributions of the image 
differentials of the astrophysical sources with nonlinear least 
square method using the Curve Fitting Toolbox of MATLAB. 
These functions are proportional to Gaussian, Cauchy and t- 
distribution. Fig. [T] shows the fitting results for CMB, syn- 
chrotron and dust images. Table Ulists the residual Root Mean 
Square Errors (RMSE) of the fits. The Gaussian gives the best 
fit for CMB because the CMB is theoretically distributed as 
a Gaussian |24|. Overall, the t-distribution appears to be a 
good choice for modeling the image differential statistics in 
horizontal and vertical directions of all the components. The 
estimated dof parameters of t-distributions show that indeed 
the proposed model assumes from impulsive to Gaussian 
characteristic underlying each component. If the component 
is Gaussian as CMB, the dof parameter becomes bigger and 
if it is impulsive, the dof parameter becomes very small. 

III. Component Separation Problem in 

OBSERVATIONAL ASTROPHYSICS 

Virtually any application in observational astrophysics has 
to do with problems of component separation. Indeed, all 
the astrophysical observations result from the superposition 
of the radiation sources placed along the line of sight. While 
very distant sources can be distinguished by the redshift 
analysis, for nearby sources this is not possible. In fact, 
physically distinct sources can sometimes be found within a 
close range of each other Furthermore, high sensitivity and 
high resolution measurements can give rise to source mixing 
problems even in the cases where the radiation under study 
is dominant over interfering radiations. Apart from redshift 
analysis, useful methods to distinguish between superimposed 
physically different radiations include spectral analysis and 
morphological analysis. In this paper, we only treat the former 
approach, exploiting the differences in the emission spectra 
shown by physically distinct radiation sources. This implies 
that the separation must be done on the basis of measurements 
made at different frequency bands. 
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We assume that the observed images, yk, fee {1,2,..., K}, 
are linear combinations of L source images. Let the fcth 
observed image be denoted by yk.i, where i G {1, 2, . . . , N} 
represents the lexicographically ordered pixel index. The im- 
age separation problem consists in finding L independent 
sources from K different observations. If s; and denote 

X 1 vector representations of source and observation images, 
respectively, then the observation model can be written as 



yfc 



L 

E 

1=1 



QkASl + rifc, 



k = l,.. 



(1) 



where nfc is an iid zero-mean noise vector with S = ct^Iat 
covariance matrix and I^r is an identity matrix. Although the 
noise is not necessarily homogeneous in the astrophysical 
maps, in this study we assume that the noise variance is 
homogeneous within each sky patch and is also known. 

Since the observation noise is assumed to be independent 
and identically distributed zero-mean Gaussian at each pixel, 
the likelihood is expressed as 

K 

P(yi:K|Sl:L,A) CX J|exp{-W^(Sl:L|yfc,A,CT2)}(2) 



k=l 



t^(si:i|yfe,A,a2) = 



ll(yfc - Y.i=iak,isi 



(3) 



where the mixing matrix A contains all the mixing coefficients 
ak,i introduced in ([T]). 

For many purposes, a mixing model of the type ([T]i is 
considered to fit reasonably well to an astrophysical obser- 
vation. The details on how to get an equation similar to ([TJ 
from the physics of the problem can be found in |25|. Here, 
we only summarize the main assumptions made with this 
purpose. First, we assume that the superposition of the signals 
originating from different sources is linear and instantaneous. 
In the astrophysical case, this assumption is clear, since the 
physical quantities to be measured are superpositions of elec- 
tromagnetic waves coming, for any bearing, exactly from the 
same line of sight without any scattering or diffraction effect. 
The second assumption in modeling astrophysical observations 
is that each source has an emission spectrum that does not 
vary with the bearing. This assumption implies that individual 
radiations result from the product of a fixed spatial template 
and an isotropic emission spectrum. Both assumptions need 
closer attention. The precise emission spectrum generated by 
any physical phenomenon depends on many quantities that 
may not all be distributed uniformly in the sky. Although in 
many applications the isotropy assumption has been adopted 
successfully, in many other cases the space-variability of the 
radiation sources must be taken into account to allow a good 
separation to be performed. Furthermore, if the effect of the 
telescope is taken into account, then the instantaneous model 
is no more valid since, for the finite aperture, the light captured 
in a fixed direction in the telescope does not come from that 
direction alone. In formulas, model ([T]l becomes 



yfc = hfc * yfc = hfc * ^ afc,;S; 



nfc 



(4) 



where the asterisk means convolution, and hfc is the telescope 
radiation pattern in the fc'th observation channel. Note that, if 
hfc is the same for all the channels, © can be written as 



L 

Yk =^afc,ih*s; + nfe 

1=1 



L 

E 

1=1 



a-k.isi + Hfc 



(5) 



1=1 



and the problem is again instantaneous for the modified 
sources s;, which are the physical sources smoothed by the 
common radiation pattern h. Unfortunately, especially in the 
radio- to millimeter-wave ranges, the telescope aperture de- 
pends strongly on frequency, and model (|5]l cannot be adopted 
directly, unless the observed signals are preprocessed to reduce 
their angular resolution to the worst available (see L26J ). 

Since no imaging system can achieve an infinite resolution, 
we should use ^ as our generative model. However, to avoid 
problems with the convolutive mixtures, we assume to have a 
telescope with the same radiation pattern in all the channels, 
so as to be able to use model (|5]l. Note that this can always be 
obtained by preprocessing, provided that all the beam patterns 
are known. Hereafter, as it will not cause any ambiguity, we 
drop the tilde accent from the symbols used to denote the data 
and the source vectors. 

IV. Source Separation Defined in the Bayesian 
Framework 

A. Source Model 

Neighbor pixels in our images have strong dependency. This 
is demonstrated, for example, in Fig. |2d) where the scatter- 
plot shows the first order right neighbor pixels of the dust map 
shown in Fig. EJa). The dependency decreases in the high 
intensity region. This region in the scatter-plot corresponds 
to spatially localized structures with high image intensity of 
the map in Fig. |2a). The existence of a small number of 
point-like structures in the maps indicates that the dependency 
assumption is valid. In view of this, we can write an auto- 
regressive source model using the first order neighbors of the 
pixel: 

s; = ai^dGdSi + U^d (6) 

where d E {!,..., D} denotes one of the main directions 
(left, right, up and down) and D = 4 is the cardinality of 
the set of image differential directions. Matrix Gd is a linear 
one-pixel shift operator in direction d, ai,d is the regression 
coefficient and the regression error ti d is an iid t-distributed 
zero-mean vector with dof parameter /?/ ^ and scale parameters 
Std, T{ttd\0,SLdlN,l3i,d)- We can justify the iid assumption 
of ti,d by plotting (see Fig. I2e)) the values in t; ^ versus 
its first order neighbors, G^t; rf. We can interpret t; ^ as a 
decorrelated version of s;. Fig. |2b) shows t; ^ for d = 1. By 
comparing Fig. |2jd) and (e), we can say that ti^d is spatially 
more independent than s;. 

If the image s; were Gaussian distributed, then the regres- 
sion error would also be Gaussian. However in real images, 
the regression error is better modelled by some heavy-tailed 
distribution. The ^-distribution can conveniently model the 
statistics of data whose distribution ranges from Cauchy to 
Gaussian, and therefore it is a convenient model for the 
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(b): Right ti = s - Gis and (c): Up 
(d): Scatter-plot of the first order right 



Fig. 2. (a): The synchrotron map s, 
t3 = s — G3S difference maps of (a), 
neighbor pixels of (a), s and Gis. (e): Scatter-plot of the first order right 
neighbor pixels of (b), ti and Giti. (f): Scatter-plot of the right difference 
(b) versus up difference (c), ti and t^. (g): Scatter-plot of the right difference 
(b) versus left difference (c), ti and t4. 



a weak dependence between right and left difference images, 
we maintain the independency assumption to constitute the 
products of the regression error probabiUties model. A similar 
approach to constitute a single prior by multiplying different 
individual priors can be found in fTT\. In this study, we use the 
left/right and up/down differences jointly in the prior model 
to balance their contributions. Otherwise, we could obtain 
directionally biased results. 

The t-distribution can be written in implicit form by using 
a Gaussian and a Gamma density. The i-distribution has a dof 
parameter /3, which is itself governed by Gamma distribution 
with parameter (5/2. The f-distribution has the following form 

mo): 



PitLdWLd,Si^d)pi'^l,d\f^Ld)diyiM 



1^1, d 



(8) 



The interpretation of this equation is that if ti,d\Si,d, i^i.d is 
distributed with a normal density Af{ti,d\0, Si^dXN /'^i,d) and 
the parameter i^i ^ has a Gamma prior as G{i'i.d\Pi.d/'^, Pi.d/^), 
the distribution of ti d\l3i,d, 6i.d becomes a i-distribution such 
that T(t; (j|0, (5; dliv, A.d)- The representation in ([S]! is a 
particular case of the Gaussian Scale Mixture (GSM) densities. 
The ML estimation of the parameters ai^d, Pi.d and Si^d using 
the EM method ||30l is given in Section IV Bl 



statistics of the high spatial frequency contents of images, 
such as regression errors |6|. The scale parameter 61, d can 
be assumed as a space-varying parameter to model the highly 
non-stationary sources, i.e. sparse sources, but homogenous 
variance assumption has been observed to be adequate for 
diffuse astrophysical source images for the image patch sizes 
that we use in the simulations. We use a homogeneous variance 
since, in our case, the increased complexity derived from 
inhomogeneity is not justified by a significant improvement 
in performance. 

The regression error t( ^ represents the directional image 
differential in the direction d. The multivariate probability 
density function of an image modelled by a i-distribution can 
be defined as 



p(ti,d|ai,d,/3i,d,(5i,(i) 



T{{N + Pi^d)/2) 



r(A,rf/2)(7rA,d(5z,rf)^/2 



1 



4>disi,ai,d) 
Pi,dSi,d 



-(A'+A.d)/2 

(7) 



where (f>d{si,ai^d) = \\U,d\\^ = ||s; - ai.dGdSiW^ and r(.) is 
the Gamma function. We can write the density of s; by using 
the image differentials in different directions, assuming direc- 
tional independence, as p(si|e) = Yld=iPi^i.d\aLd, PiM: Si^d) 
where 9 = {ai.,L,i:D, [Si-.la-.d, Si:l,1:d}- We can simply 
justify this assumption by plotting the horizontal regression 
error versus the vertical one as shown in Fig. |2f). We can 
observe in this figure that the scatter plot of the regression 
errors in left and up directions is almost circular and this 
justifies our assumption of independence. Fig. |2jg) shows the 
scatter-plot of right difference versus left difference. In spite of 



B. Posteriors 

The joint posterior density of all the unknowns in the BSS 
problem can be written as: 

p(si:L,A, e|yi:K) ocp(yi:K|si:L, A)p(si:L,A, 9) (9) 

where p(yi:x|si:L, A) is the likelihood and p(si:L,A, 8) 
is the joint prior density of unknowns. The joint prior 
can be factorized as p(si:L|ai:L,i:_D, /3i:L4:D, ^1:L4:d) p(A) 
p(/3i:L,1:d) p((5i:l,i:_d) p{ai:LMD)- Furthermore, since the 
sources are assumed to be independent, the joint probability 
density of the sources is also factorized as p{si-l\&) = 

Mathematically, we can assume uniform priors for ai d G 
(—1,1), Si^d G (0,00) and a^j e (0, 00), because at/s 
are always positive. The practical usage of these priors is 
explained in Section lV-DI We use a conjugate Gamma prior for 
Pi.d ^ ^(1/2,2 X 10~^). We have determined the respective 
parameters experimentally. The conditional posteriors of all 
model parameters are written as 

P(afc,i|yi:K,Sl:L, A_afc,,,6) OC p(yi:i^|Sl:L, A) 
P(ck;,£i|yi:A',Sl:L, A, e_a,^) OC p(t;,d|6) 

P(/3i,d|yi:K,Sl:L, A,e_/3, J OC p{tl^d\Q)p{l3l,d) (10) 

P{Sl,d\yi:K,Sl:L,A,Q^S,^a) « p{tl,d\0)piSLd) 

P(Si|yi:K,S(l:L)_i,A, 6) OC p(yi:K|Sl:I,, A)p(Si|e) 

where —variable expressions in the subscripts denote the 
removal of that variable from the variable set. The parameters 
a, (3 and S have size Lx D, A has size Kx L and the sources 
have size Lx N. Overall there are {3D + K + N)L unknowns. 
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For parameters ai^d, Si^d and /?; we exploit the EM 
method. To estimate the source images, we use a version of 
the posterior p(s;|.) augmented by auxiliary variables and find 
the estimation with a Langevin sampler The details are given 
in Section FV] 

V. Estimation of Sources and Parameters 

In this section, we give the details of the estimation of the 
sources and the parameters. 

A. Sources 

We modify the posterior densities of the source images 
p(s;|0) to obtain a more efficient MCMC sampler In the 
classical MCMC schemes, a random walk process is used 
to produce the proposal samples. Although random walk is 
simple, it affects adversely the convergence time. The random 
walk process only uses the previous sample for producing a 
new proposal. Instead of a random walk, we use the Langevin 
stochastic equation, which exploits the gradient information 
of the energy function to produce a new proposal. Since the 
gradient directs the proposed samples towards the mode, the 
final sample set comes mostly from around the mode of the 
posterior 1.281, ll29l. 

The Langevin equation can be obtained from the total 
energy function. We first define everything in continuous time 
to give the derivation steps of the Langevin equation, then we 
transfer them into discrete time. To obtain the total energy 
function, we introduce a velocity parameter v; (t) — dsi (t) / dt 
to define the kinetic energy such that 



(11) 



where M is a diagonal matrix whose diagonal elements 
correspond to mass parameters m; „ for pixel index n = 
1,...,N. Using the velocity parameter v;, the modi- 
fied version of the posterior density in (fTOl i is writ- 
ten as p(si,Vi|yi:^,S(i:i)_,,A, e,Mi) cx p(yi:K|si:L, A) 
p{si\Q)p{vi\'M.i). More explicitly, it can be written as 



p(si,Vi|yi:K,S(i:L)_;,A, e,M) cx 
exp{-(W^(si,i|A) + U{si\Q) + K{vi\M))} 



(12) 



where the energy function U{si\Q) of a source image can be 
written in terms of image differentials t; ^ as 



D 



u{si\e) = J2piti,dm. 



(13) 



d=l 



where the function p{ti^d\'d) is proportional to the negative 
logarithm of the t-distribution in that is. 



p(ti,<i|e) 



loa 



1 



(14) 



and the function log[l + (l)d{suai^d) / PudSud] is the regular- 
ization function proposed in ifTOl . The terms [N + l3i,d)/'2, 
and Pi^dSi.d correspond to the regularization and the thresh- 
old parameters, respectively, used in edge preserving image 
reconstruction. 



The energy function VF(si:l|A) was defined in (O. The 
total energy function is proportional to the negative logarithm 
of the posterior In summary, the three terms correspond, 
respectively, to the fit to data and to the inertial and the 
kinetic energy terms. We can define the Lagrangian function: 
L{si{t),vi{t)) = K{wi) - W{si..l) - U{si) and write the 
Lagrange-Euler equation for the Lagrangian as follows 

d_ r dL{si{t),viit)) \ ^ dL{si{t),Mt)) 
dt \ dvi J dsi 



(15) 



'dt 



d_ 
dsi 



Eisi). 



(16) 



where E{si.l) = W{si.l) + U{si). If we discretize the 
dynamics in ( fT6] l and velocity v; (t) using the Leapfrog method 
fTSi, we obtain the following three-step iteration 



k+l 



'I 



(17) 

(18) 
(19) 



where g(s*^^^) = [Vs,i;(si:L)]g^^^gfc^, V^, is the gradient 
with respect to S; and t; is the discrete time step. If we define 
a diagonal matrix D,^ = t;M; ^, so that, for the nth pixel, 
the diffusion coefficient is D/(n,n) — Tf/mi,n- Matrix D 
is referred to here as the diffusion matrix, and is derived in 
Section IV-All Instead of this step scheme, we use the one- 
step Langevin difference equation. To obtain the single step 
Langevin update equation for S/, we substitute ( fTTl ) into ( fTSl ). 



-.k+l 



= sf-iDzg(stJ + DfMfvf 



(20) 



This form is also used in 11281 . ||29| . The samples are 
produced by using this first order equation, and then they are 
tested in the Metropolis-Hastings scheme. 

If we assume the transitions in ( |20l ) as a Wiener process 
and take into account the fact that the velocity vector v; is 
independent of the source vector Sj, ifTSl , then its probabil- 
ity density function can be set as a multivariate Gaussian 
as pv,(vi) - (|M,|/2^)*exp{-ivf(i)M,v,(i)}. We can 
produce a random sample from this probability such that 
V; = ^ w; where w; is a zero-mean Gaussian vector with 
identity covariance matrix A/'(w;|0,I). If we substitute this 
random sample into (l20l i. we obtain the associated Langevin 
equation 

sf+i ^ sf _ \T>ig{siL) + Df (21) 

Since the random variables for the image pixel intensities 
are produced in parallel by using ( 1211 1. the procedure is faster 
than the random walk process adopted in [[l]. The random 
walk process produces local random increments independently 
from the neighbor pixels and the observations. In the Langevin 
sampler, the samples are generated in an interrelated manner 
and in terms of the descent of an energy function that reflects 
the goodness of the model fit. Once the candidate sample 
image is produced by (ISTT i. the accept-reject rule is applied 
independently to each pixel. In the case of random walk. 
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TABLE II 

Metropolis-Hastings algorithm for a source image, u: uniform 

POSITIVE random number IN THE UNIT INTERVAL; z: GENERATED 
SAMPLE VECTOR TO BE TRIED; (/3{2„, S*^^) : ACCEPTANCE RATIO OF THE 
GENERATED SAMPLE. 



1) W; ~ A/'(W;|0,I) 

2) H(sf)^[diag{H(sO},,^,.]-i 

3) ^2[H(sf)]-i 

4) g{4i)^ [Vs,S(si:i)],^_^^,. 

^-^ 1:L ^ 

5) produce z i — sf — |Dig(sJ.^) + Dj' w; from (2l\ . 

6) for all pixel n = 1, . . . , N 

a) calculate (/^(zn, 

b) if <p{z„,sf j > 1 then sf +i = z„ 
else produce n ~ (7(0, 1). 

if u < (p{z„, sfj then sf+^ - ~ 
else s 

c) n + 1 



fc+l 

l.n 

next pixel 



l,n' 
.k 
l,n 



p(s;,v,|yi.K,S(i.i)_;, A, e,M), we obtain the following 
equation 

{E{si + As;)) = {E{si) + WE{sifAsi + ^Asf H(sOAs,) 

(25) 

where H(s/) is the Hessian matrix of E{si) with respect to S;. 
From this equation, the optimum infinitesimal As; is found as 
As, = -[(H(sO)]-i(V£;(sO). 

If we also take the expectation of both sides of (|20| |. we 
obtain 

1. 



= (sf) - -D,g(st\ J 



(26) 



and comparing (s^ 



A;+l\ _ 



As; with 



we write 



D/ff(s^,i) = -2[{B.{si))]-^{VE{si)). Rather than the ex- 
pectation of the inverse of Hessian matrix, we use its diagonal 
calculated by the value of s; at the discrete time k as 



we would produce the candidate sample pixel and apply the 
accept-reject rule. The sampling of the whole image would 
be completed by scanning all the pixels in a sequential order 
as in Gibbs sampling. Since each pixel has to wait the 
update of the previous pixel, this procedure is very slow. In 
random walk, candidate pixels can be produced in parallel 
but, producing a candidate sample for the whole image using 
random walk is not a reasonable method because hitting the 
right combination for such a huge amount of data (i.e. w 10^) 
is almost impossible. By Langevin sampler, the likelihood of 
approximately hitting the right combination at any one step is 
much higher 

After their production, the samples are tested via 
Metropolis-Hastings |34| scheme pixel-by-pixel. The accep- 
tance probability of any proposed sample is defined as 

^^^{'Pisi+\slJ,l}, where 



where AE{sft') - Eisft^s 



l,n ■:''{l:L)-l,nl 



E{si 



L.n) 



(22) 



and 



^(sll,„) = W^(si':L,«) + U{slJ. For any single pixel, 
U{si^n) can be derived from ( fT3] l and (fl4l i as 



" 1 

E- 



log 



d=l 



<t)d{si,n,ai,d) 

Pl,d5l.d 



(23) 



The proposal density <z(sf^^|s^jj is obtained, from (l2Tl l. as 



^5(4., J,- 



(24) 



One cycle of the Metropolis-Hastings algorithm embedded 
in the main algorithm, for each source image, is given in Table 

m 

1) Diffusion Matrix: In this section, we give a method 
to find an optimum diffusion matrix D. The method must 
ensure that the produced sample s^^^ comes from the 
joint conditional distribution p(s/, v/jyi-i^, S(i.i)_;, A, Q, M) 
introduced in ( fT2] i. If we write the Taylor expansion of 
E{sf) with the infinitesimal As; and take the expec- 
tation of both sides with respect to the joint density 



(27) 



where H(sf) = [diag {H(s;)}g^^gfc]"^ and diag{.} operator 
extract the main diagonal of the I-tessian matrix. From dZTl) . 
we can find the diffusion parameter as 



2[H(sf)]-i. 



(28) 



This approximation is justified if H(s;) is strongly diago- 
nally dominant. 

B. Parameters of t-distribution 

We can write the joint posterior of the parameters ai^d, Pi,d 
and 5i^d such that p{ai^d, l3i,d,5Ld\U.d,Q -{ai,a,i3i,a,Si.d}) = 
p{ti^d\Q)p{l3i^d)p{6i,d). Using the likelihood p{ti.d\Q) in dD 
and the priors of the parameters, we can find the MAP esti- 
mates of the parameters of the t-distribution by EM method. 
Instead of maximizing the log {p{ti^d\Q)p{PLd)p{5i a)}, we 
maximize the following function 



log 



p{ti^d\Q)p{Pi,d)p{^i,d) 

p{viM^,Q'^) 



■ p{vi,dKd,Q^)dvi,d (29) 



= {\og{p{ti^d\Q)p{Pi4)p{5Ld)}) 
- {logpMtid,®^)) 



where p{vi,d\t\ d^^^) posterior density of the hid- 

den variable vi d conditioned on parameters estimated in the 
previous step k and {.)^^ ^^.t represents the expectation 

with respect to i/; ^|tf ^, 8'^. For simplicity, hereafter we use 
only (.) to represent this expectation. The parameter vi^d is a 
hidden (or latent) variable that changes the scale of the Gaus- 
sian density N{tid\0T5^ d^N /vi.d) and has a Gamma prior 
Q{vi,d\(3i dl'^i f^f d/"^)- exploiting vi d, we can define the t- 
distribution as a scale mixture of Gaussians as in (|8]l. The sec- 
ond term on the righthand side of ( |29] l, — (logp(i^;,d|tf ^, 6'')), 
corresponds to the entropy of the posterior density of vi^d, and 
is independent of the unknowns, and the function 



0(9; e^-) = {\og{p(tiAQ)p{PM5Ld))) 



(30) 
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The aim is to find the maximum of Q{Q; with respect 
to 9; 

e'^'+i = argmaxQ(e;e*=) (31) 

In the E (expectation) step of the EM algorithm, we must 
calculate the expectation {.)^^ ^j^-t Qk. For this purpose, we 
find the posterior density of 



g 1 viA — o — , — 1 1 + gk 

'"^l.d l,d 



The expectation of vi^d is 



Pi. 



1 



(32) 



(33) 



''l,d \ Pl,d^l,d / 

In the M (maximization) step, (|30] | is maximized with 
respect to 8. To maximize this function, we alternate among 
the variables ai^d, l3i,d and Si^d- After taking the logarithms 
and expectations in (|30] |. the cost functions for ai^d, Pi.d and 
6i.d are written as follows 



J^Ld) 



4>d{shai^d) 



N . 



2Si,d 

^ 4'd{si,ai^d) 
'2Si,d 



(34) 
(35) 



QiPLd^Q") - -iogr(%^) + (^^-i) (logi.,^,) 



-^logA., 



■log 2 



■^M^ (l + ^f^) - 0.002/3,, 



The solutions to (O and (l35T l can be easily found as 



^[G'dGdSi 

(t>d{suaiA) 
N 



(36) 

(37) 
(38) 



The maximization of ( I36b does not have a simple solution. 
It can be solved by setting its first derivative to zero: 



0.002 = 



(39) 



where V'i(-) is the first derivative of logr(.) and it is called 
digamma function. 

C. Parameters of the Mixing Matrix 

We assume that the prior of A is uniform between 
and oo. The conditional density of ak,i is expressed as 

p{akd\yi:K,Q-ak.i) °^ -P(yi:i^|0*)- ^rom ©, it can be seen 
that the conditional density of ak,i becomes Gaussian. The 
parameter ak^i is estimated in each iteration as 

1 ^ 
ak,i = -j^sjijk - ^ ak,iSi)u{ak,i) (40) 

where u{ak.i) is the unit step function. 



TABLE III 

One cycle of Adaptive Langevin Sampler for source 

separation. the symbol < denotes analytical update, the 

symbol i denotes update by finding the zero root and the 

symbol ~ denotes update by random sampling. 

Find the initial mixing matrix (i.e. FDCCA 131)). 
Find the initial source images using the LS solution. 
Initialize the parameters ^, /3f ^ and 5^^ 
for all source images, I = 1 : L 
for all directions, d = 1 : D 



Wi.d) 



1 + 



/3, 



(i^i.d) ]v 

'0 [-l/'l(%i)+log 



0.002 = 0] 



' /3l,d 
for all pixels, n = 1 : N 

Using Metropolis-Hastings method in Table HTl 

~ {p(^!,n|yi:K,ei 

for all elements of the mixing matrix, (fc, 1) = (1, 1) : {K, L) 



O-kd 



D. Adaptive Langevin Sampler Algorithm 

The proposed Adaptive Langevin Sampler algorithm is 
given in Table |lll] The symbol < — denotes analytical update, 
the symbol i — o denotes update by finding the zero root and 
the symbol ^ denotes the update by random sampling. The 
sampling of the sources is done by the Metropolis-Hastings 
scheme given in Table To deal practically with uniformly 
distributed positive variables, we assume that they lie in the 
range [0.0001, 1000]. 

1) Initialization: We start the algorithm with the mixing 
matrix obtained by the FDCCA (Fourier Domain Correlated 
Component Analysis) f3T| method. The initial values of as- 
trophysical maps are obtained by Least Square (LS) solution 
with the initial mixing matrix. The initial values of ai,d can 
be calculated directly from image differentials. We initialized 
the Pid — O and found the initial value of 6i^d by equaling the 
expectation ( |33] | to a constant. In this study, we take the initial 
value of this posterior expectation 1.5. So the initial value of 

2) Stopping Criterion: We observe the normalized abso- 
lute difference between sequential values of s; to decide 
the convergence of the Markov Chain to an equilibrium. 
If |sf - sf~^|/|sf"^| < 10"^, we assume the chain has 
converged to the equilibrium for s; and denote this point 
Ti = k. Since we have L parallel chains for L sources, the 
ending point of the burn-in period of the whole Monte Carlo 
chain is Tg = max; T;. We ignore the samples before Tg. We 
keep the iteration going until Tg that is the ending point of the 
post burn-in period simulation. In the experiments, we have 
used 100 iterations after burn-in period, so = Tg + 100. 

VI. Simulation Results 

To test our procedure, we assume nine observation channels 
with center frequencies in the range 30-857 GHz, where the 
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Original LS (Initial) ALS-t GS-MRF 




Fig. 3. The separated astrophysical images from noisy observations witli tlie 
LS, ALS-t and GS-MRF metliods. The location of the patch is 0° longitude 
and 40° latitude, out off galactic plane, and has a size 64 X 64 pixels. 



dominant diffuse radiations are the CMB, the galactic syn- 
chrotron radiation and the thermal emission from galactic dust. 
Except for CMB, these radiations have unknown emission 
spectra (that is, the coefficients a^.i in ([T) are not all known). 
Both observation models in ^ and (|5]l are suitable for our 
algorithm. In the experiments, we assume the model in ([U, that 
is of instantaneous mixtures and isotropic sky, to be valid. We 
plan to attack the problems of space variability and channel- 
dependent convolutional effects in the future. 

In the sequel, we present astrophysical image separation re- 
sults on a comparative basis. The proposed method is denoted 
as ALS-t (Adaptive Langevin Sampler-t-distribution) and is 
compared to four other methods, namely: 1) GS-MRF, which 
is the MRF model coupled with Gibbs sampling |1|; 2) LS, 
which forms our initial estimates on the basis of the values of 
ak,i obtained by FDCCA 131]; 3) Iterated Conditional Modes 
(ICM), which maximizes the conditional pdfs sequentially for 
each variable [llj; 4) ALM-MRF, which is the solution of the 
MRF model via Langevin and MetropoUs-Hastings schemes 

nm. 

The leftmost column of Fig. |3] shows the ground-truth 
simulated astrophysical source maps. The remaining columns 
show the source maps separated by LS, ALS-t and GS-MRF, 
respectively. The sky patch used for this experiment is centered 
at 0° longitude and 40° latitude in galactic coordinates and 
has a size of 7.3 x 7.3 square degrees in the celestial sphere, 
discretized in a 64 x 64 pixel map. 

The Peak Signal-to-Interference Ratio (PSIR) is used as a 
numerical performance indicator. The PSIR can be calculated 
if the ground-truth is known, which is the case in our work 
since all sky components are simulated. For this patch, the 
algorithm converges after 155 iterations and uses a total of 
255 iterations to reach the solution (see Fig. |4]i. We compare 
the results with the ones of LS, ICM, GS-MRF and ALM- 
MRF 

Table |IV] lists the PSIR values and the process times. The 
simulations are run on a Core2 CPU L86 GHz PC. The process 
time of ALS-t is much shorter than that of the GS-MRF. The 
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Fig. 4. The PSIRs of the sources in the pixel domain as a function of iteration 
number. The vertical line signifies the starting point Ts- 



TABLE IV 

The PSIR (dB) values of the separated components and the 

PROCESS TIME OF THE ALGORITHMS IN MINUTES. 





CMB 


Synchrotron 


Dust 


time 


LS 


30.69 


15.03 


37.37 


1.32e-4 


ICM 


26.27 


17.64 


35.30 


0.31 


GS-MRF, Q] 


27.81 


22.33 


38.93 


226.72 


ALM-MRF, l22l 


27.91 


20.88 


36.41 


2.86 


ALS-t 


33.45 


26.21 


40.51 


1.65 



execution time of ALS-t is two orders of magnitude smaller 
than that of GS-MRF. The PSIR values of ALS-t are also 
over those of LS, ALM-MRF, GS-MRF and ICM, especially 
for synchrotron, and furthermore, the smoothing degradation 
of ICM on the synchrotron component is not observed in the 
proposed method , Fig. |3] 



Original LS (Initial) ALS-t 




95.05 dB 96.83 (tB 



Fig. 5 . The separated astrophysical images from noisy observations with the 
LS and ALS-t methods. The location of the patch is 20° longitude and 0° 
latitude, galactic plane, and has a size 128 X 128 pixels. 



TABLE V 

The PSIR IMPROVEMENTS (dB) with respect to initial LS 

SOLUTION. 





CMB 


Synchrotron 


Dust 


(0°,40°) 
(20°, 0°) 


3.01 
1.80 


10.01 
4.54 


4.08 
1.78 
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The PSIR 



TABLE VI 

spec (DB) values OF THE SEPARATED COMPONENTS, IN THE 
ANNULAR FREQUENCY DOMAIN. 



0°,40° 



20°. 0° 





CMB 


Synch. 


Dust 


CMB 


Synch. 


Dust 


LS 
ALS-t 


30.33 
35.87 


2.65 
26.23 


37.76 
40.69 


31.59 
34.53 


45.69 
46.98 


39.35 
38.93 



We have also run the algorithm for 128 x 128 pixels patches 
centered at (0°, 40°) and (20°, 0°). Fig.|5]shows the results for 
the patch (20°, 0°). In that patch, the relative intensity of CMB 
is the weakest one. The PSIR values of the estimates of LS and 
ALS-t are written under the maps and the PSIR improvements 
are listed for that patch and for the patch (0°,40°) in Table 
|Vl The total time of the ALS-t algorithm for the 128 x 128 
size patches is about 5.31 minutes. 

We also use an alternative performance criterion, defined 
in the spherical harmonic (frequency), i, domain, since the 
angular power spectrum is relevant to astrophysics. If we 
decompose a CMB map on spherical harmonics, the com- 
plex coefficients, cgm — 0,1,2,..., m e [—£,£]), define 

the angular power spectrum, C{1), as the average C{t) ~ 
1 * 

In Fig. |6] we plot the standard power spectrum, C{i), 
defined as C{t) = {i + 1)IC{1)I2ti of the original and the 
reconstructed sources in the two patches considered. In order 
to compare different methods, we also introduce the Peak 
Signal-to-Interference Ratio in the ^-domain defined as 



PSIRspec = '2Q\0: 



( ^v^/2 + 1 X max(C'(^))\ 



(41) 



where C{£) is the estimated power spectrum. 

In the off-galactic patch considered in Fig. |6] the intensity 
of synchrotron is very low and the LS solution for synchrotron 
is contaminated too much by noise. The estimated CMB and 
the dust spectrums by ALS-t follow the ground-truth spectrum 
better than the LS one, especially in the high frequency 
regions. For the patch (20°, 0°), synchrotron and dust are 
estimated adequately by LS, but the LS estimate of CMB is 
improved by ALS-t. The related PSIRspec values are presented 
in Table [VT] 

We have observed that the estimated regression parameter 
ai^d is quite isotropic for all the maps. For the CMB map in the 
(0°,40°) patch, its value is about 0.88 for all d. In the same 
patch, the values of the parameter ai,d for synchrotron and 
dust are 0.99. These results show us that the CMB radiation 
is spatially less correlated than the other radiation sources. We 
assume the parameter (3i^d is isotropic and estimate a single 
value for each direction. The EM estimation of /J; ^ depends 
too much on its prior and initial value. We have allowed 
the parameter Si d to be anisotropic, but at the end of the 
estimation steps we have found that it is almost isotropic for 
all radiation maps. 

VII. Conclusion and Future Work 

We have developed a Bayesian source separation algorithm 
for astrophysical images where the MCMC samples are gener- 




Gtound-tmlh 




(a) CMB, patch {0°,40°) 



(b) CMB, patch (20°, 0°) 



— Groundtaih 



Ground-trulh 



(c) Synchrotron, patch (0° , 40° 



(d) Synchrotron, patch (20°, 0°) 



Ground -truth 



Ground-lruth 




(e) Dust, patch (0°,40°) 



(f) Dust, patch (20°, 0°) 



Fig. 6. Ground-truth and estimated angular power spectrums for patches 
(0°,40°) and (20,0). The ground-truth spectrum (solid line), spectrum of 
LS solution (dot line) and spectrum of ALS-t solution (solid hne marked with 
+). 



ated through the Langevin stochastic equation. The proposed 
algorithm provides two orders of magnitude computational 
economy vis-a-vis the Gibbs sampling approach. In addition, 
it generates better source separation as compared to all its 
competitors, i.e., LS, ICM, ALM-MRF and GS-MRF methods 
measured in terms of PSIR in the pixel domain and PSIRspec 
in the annular frequency domain. The algorithm can recon- 
struct the high frequency regions of the power spectrums with 
higher fidelity. A byproduct of this approach is the capability 
to estimate the parameters of the t-distribution image priors. 
Although the proposed ALS-t method takes longer than either 
ICM or LS methods, its superior performance by far outweighs 
this disadvantage, and furthermore the algorithm lends itself 
to parallel processing. To improve the algorithm performance, 
non- stationary image priors, more efficient discretization time 
step and diffusion matrix can be investigated in the future. 
Another point to obtain a beneficial algorithm might be the 
usage of more than one MH-steps, because the EM algorithm 
which estimates parameters converges faster than the Monte 
Carlo sampling scheme. 

Our new goal is the application of the proposed algorithm 
to whole-sky maps. To avoid the difficulties inherent in this 
problem, we plan to use the "nested numbering" structure 
provided by the HEALPix |37| package. In this format, we can 
reach the indexes of the eight neighbors of each pixel on the 
sphere. To calculate the pixel differences, we will implement 
a gradient calculation method on the sphere by taking the 
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non-homogeneous spatial distances between the pixels on the 
sphere into consideration. 

Other issues to be addressed are the channel-dependent 
blurring effects of the antennas and the non-stationary nature 
of noise. We have to reformulate the source separation problem 
without these simplifying assumptions on the observations. 
Finally, a pixel-based estimation error is being analyzed with 
the goal of defining a stopping criterion for the algorithm. 
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